首页> 外文OA文献 >Data Smashing
【2h】

Data Smashing

机译:数据粉碎

摘要

Investigation of the underlying physics or biology from empirical datarequires a quantifiable notion of similarity - when do two observed data setsindicate nearly identical generating processes, and when they do not. Thediscriminating characteristics to look for in data is often determined byheuristics designed by experts, $e.g.$, distinct shapes of "folded" lightcurvesmay be used as "features" to classify variable stars, while determination ofpathological brain states might require a Fourier analysis of brainwaveactivity. Finding good features is non-trivial. Here, we propose a universalsolution to this problem: we delineate a principle for quantifying similaritybetween sources of arbitrary data streams, without a priori knowledge, featuresor training. We uncover an algebraic structure on a space of symbolic modelsfor quantized data, and show that such stochastic generators may be added anduniquely inverted; and that a model and its inverse always sum to the generatorof flat white noise. Therefore, every data stream has an anti-stream: datagenerated by the inverse model. Similarity between two streams, then, is thedegree to which one, when summed to the other's anti-stream, mutuallyannihilates all statistical structure to noise. We call this data smashing. Wepresent diverse applications, including disambiguation of brainwaves pertainingto epileptic seizures, detection of anomalous cardiac rhythms, andclassification of astronomical objects from raw photometry. In our examples,the data smashing principle, without access to any domain knowledge, meets orexceeds the performance of specialized algorithms tuned by domain experts.
机译:根据经验数据对基础物理学或生物学进行研究需要一种可量化的相似性概念-两个观察到的数据集何时表明几乎相同的生成过程,而当它们不同时。在数据中寻找的区别特征通常由专家设计的启发法确定,例如,“折叠”光曲线的不同形状可以用作对可变恒星进行分类的“特征”,而病理性脑状态的确定可能需要对脑波活动性进行傅立叶分析。找到好的功能并非易事。在这里,我们提出了一个解决这个问题的通用方案:我们描述了一种无需先验知识,特征或训练即可量化任意数据流源之间相似度的原理。我们在符号化模型的空间上发现了量化数据的代数结构,并表明可以添加这种随机生成器并对其进行唯一反转。而且模型及其逆总是总会产生平坦的白噪声。因此,每个数据流都有一个反流:由逆模型生成的数据。因此,两个流之间的相似度是一个流与另一个流的总和相互抵消时将所有统计结构消除的程度。我们称这种数据粉碎。我们提出了各种各样的应用,包括消除与癫痫发作有关的脑电波的歧义,异常心律的检测以及从原始光度法对天文物体进行分类。在我们的示例中,数据粉碎原理无需访问任何领域知识,即可达到或超过由领域专家调整的专用算法的性能。

著录项

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号